Overview

Dataset statistics

Number of variables26
Number of observations10000
Missing cells42
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory764.2 B

Variable types

NUM13
CAT11
BOOL2

Reproduction

Analysis started2020-11-05 03:37:02.504344
Analysis finished2020-11-05 03:37:55.159775
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
zip_code has a high cardinality: 720 distinct values High cardinality
emp_length has 250 (2.5%) zeros Zeros
delinq_2yrs has 8915 (89.1%) zeros Zeros
inq_last_6mths has 4607 (46.1%) zeros Zeros
mths_since_last_delinq has 6479 (64.8%) zeros Zeros
mths_since_last_record has 267 (2.7%) zeros Zeros
revol_bal has 278 (2.8%) zeros Zeros
revol_util has 254 (2.5%) zeros Zeros

Variables

is_bad
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
8705
1
 
1295
ValueCountFrequency (%) 
0 8705 87.1%
 
1 1295 13.0%
 

emp_length
Real number (ℝ≥0)

ZEROS
Distinct count14
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8639
Minimum0
Maximum33
Zeros250
Zeros (%)2.5%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median4
Q38
95-th percentile10
Maximum33
Range33
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.492219402
Coefficient of variation (CV)0.7179875001
Kurtosis-0.7128269405
Mean4.8639
Median Absolute Deviation (MAD)3.0606037
Skewness0.4600047179
Sum48639
Variance12.19559635
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 7.5 9.5 10.5 33. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 2160 21.6%
 
1 2083 20.8%
 
2 1183 11.8%
 
3 1010 10.1%
 
4 889 8.9%
 
5 779 7.8%
 
6 535 5.3%
 
7 421 4.2%
 
8 351 3.5%
 
9 331 3.3%
 
Other values (4) 258 2.6%
 
ValueCountFrequency (%) 
0 250 2.5%
 
1 2083 20.8%
 
2 1183 11.8%
 
3 1010 10.1%
 
4 889 8.9%
 
ValueCountFrequency (%) 
33 1 < 0.1%
 
22 5 0.1%
 
11 2 < 0.1%
 
10 2160 21.6%
 
9 331 3.3%
 

home_ownership
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
rent
4745
mortgage
4445
own
 
775
other
 
34
none
 
1
ValueCountFrequency (%) 
rent 4745 47.4%
 
mortgage 4445 44.5%
 
own 775 7.8%
 
other 34 0.3%
 
none 1 < 0.1%
 

Length

Max length8
Mean length5.7039
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 10 100.0%
 
ValueCountFrequency (%) 
Latin 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

annual_inc
Real number (ℝ≥0)

Distinct count1901
Unique (%)19.0%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean68203.01154
Minimum2000
Maximum900000
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum2000
5-th percentile23734
Q140000
median58000
Q382000
95-th percentile143550
Maximum900000
Range898000
Interquartile range (IQR)42000

Descriptive statistics

Standard deviation48590.25276
Coefficient of variation (CV)0.7124355899
Kurtosis51.15309953
Mean68203.01154
Median Absolute Deviation (MAD)30247.19103
Skewness4.880305421
Sum681961912.4
Variance2361012663
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60000 381 3.8%
 
50000 267 2.7%
 
40000 222 2.2%
 
75000 213 2.1%
 
30000 211 2.1%
 
65000 204 2.0%
 
48000 196 2.0%
 
70000 193 1.9%
 
45000 181 1.8%
 
80000 170 1.7%
 
Other values (1891) 7761 77.6%
 
ValueCountFrequency (%) 
2000 1 < 0.1%
 
4080 1 < 0.1%
 
4200 2 < 0.1%
 
4800 2 < 0.1%
 
5000 2 < 0.1%
 
ValueCountFrequency (%) 
900000 2 < 0.1%
 
860000 1 < 0.1%
 
780000 1 < 0.1%
 
744000 1 < 0.1%
 
725000 1 < 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
not verified
4367
verified - income
3214
verified - income source
2419
ValueCountFrequency (%) 
not verified 4367 43.7%
 
verified - income 3214 32.1%
 
verified - income source 2419 24.2%
 

Length

Max length24
Mean length16.5098
Min length12
ValueCountFrequency (%) 
Lowercase_Letter 13 86.7%
 
Space_Separator 1 6.7%
 
Dash_Punctuation 1 6.7%
 
ValueCountFrequency (%) 
Latin 13 86.7%
 
Common 2 13.3%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

pymnt_plan
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
n
9998
y
 
2
ValueCountFrequency (%) 
n 9998 > 99.9%
 
y 2 < 0.1%
 

purpose_cat
Categorical

Distinct count27
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
debt consolidation
4454
credit card
1273
other
1026
home improvement
 
800
major purchase
 
546
Other values (22)
1901
ValueCountFrequency (%) 
debt consolidation 4454 44.5%
 
credit card 1273 12.7%
 
other 1026 10.3%
 
home improvement 800 8.0%
 
major purchase 546 5.5%
 
small business 461 4.6%
 
car 349 3.5%
 
wedding 250 2.5%
 
medical 183 1.8%
 
moving 159 1.6%
 
Other values (17) 499 5.0%
 

Length

Max length33
Mean length13.9381
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 95.5%
 
Space_Separator 1 4.5%
 
ValueCountFrequency (%) 
Latin 21 95.5%
 
Common 1 4.5%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

zip_code
Categorical

HIGH CARDINALITY
Distinct count720
Unique (%)7.2%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
100xx
 
158
112xx
 
141
945xx
 
129
070xx
 
125
606xx
 
114
Other values (715)
9333
ValueCountFrequency (%) 
100xx 158 1.6%
 
112xx 141 1.4%
 
945xx 129 1.3%
 
070xx 125 1.2%
 
606xx 114 1.1%
 
900xx 107 1.1%
 
021xx 99 1.0%
 
941xx 95 0.9%
 
926xx 94 0.9%
 
331xx 93 0.9%
 
Other values (710) 8845 88.4%
 

Length

Max length5
Mean length5
Min length5
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Lowercase_Letter 1 9.1%
 
ValueCountFrequency (%) 
Common 10 90.9%
 
Latin 1 9.1%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

addr_state
Categorical

Distinct count50
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
ca
1748
ny
 
958
fl
 
714
tx
 
700
nj
 
482
Other values (45)
5398
ValueCountFrequency (%) 
ca 1748 17.5%
 
ny 958 9.6%
 
fl 714 7.1%
 
tx 700 7.0%
 
nj 482 4.8%
 
va 392 3.9%
 
il 386 3.9%
 
pa 378 3.8%
 
ga 357 3.6%
 
ma 331 3.3%
 
Other values (40) 3554 35.5%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 24 100.0%
 
ValueCountFrequency (%) 
Latin 24 100.0%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

debt_to_income
Real number (ℝ≥0)

Distinct count2585
Unique (%)25.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.338704
Minimum0
Maximum29.99
Zeros58
Zeros (%)0.6%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile2.129
Q18.16
median13.41
Q318.6925
95-th percentile23.93
Maximum29.99
Range29.99
Interquartile range (IQR)10.5325

Descriptive statistics

Standard deviation6.754211507
Coefficient of variation (CV)0.5063619004
Kurtosis-0.8546793248
Mean13.338704
Median Absolute Deviation (MAD)5.669516109
Skewness-0.008777611376
Sum133387.04
Variance45.61937308
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.055 0.195 3.395 7.675 20.325 22.835 24.965 26.885 29.99 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 58 0.6%
 
12.48 16 0.2%
 
13.51 13 0.1%
 
10 13 0.1%
 
19.2 13 0.1%
 
18.14 13 0.1%
 
4.8 12 0.1%
 
17.82 12 0.1%
 
15.38 12 0.1%
 
22.43 12 0.1%
 
Other values (2575) 9826 98.3%
 
ValueCountFrequency (%) 
0 58 0.6%
 
0.11 1 < 0.1%
 
0.12 1 < 0.1%
 
0.13 1 < 0.1%
 
0.14 2 < 0.1%
 
ValueCountFrequency (%) 
29.99 1 < 0.1%
 
29.93 1 < 0.1%
 
29.92 1 < 0.1%
 
29.83 1 < 0.1%
 
29.74 1 < 0.1%
 

delinq_2yrs
Real number (ℝ≥0)

ZEROS
Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1481
Minimum0
Maximum11
Zeros8915
Zeros (%)89.1%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum11
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5061541358
Coefficient of variation (CV)3.417651153
Kurtosis54.83739854
Mean0.1481
Median Absolute Deviation (MAD)0.2640623
Skewness5.640791298
Sum1481
Variance0.2561920092
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 5.5 11. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 8915 89.1%
 
1 822 8.2%
 
2 186 1.9%
 
3 50 0.5%
 
4 14 0.1%
 
5 6 0.1%
 
6 3 < 0.1%
 
7 2 < 0.1%
 
11 1 < 0.1%
 
8 1 < 0.1%
 
ValueCountFrequency (%) 
0 8915 89.1%
 
1 822 8.2%
 
2 186 1.9%
 
3 50 0.5%
 
4 14 0.1%
 
ValueCountFrequency (%) 
11 1 < 0.1%
 
8 1 < 0.1%
 
7 2 < 0.1%
 
6 3 < 0.1%
 
5 6 0.1%
 

inq_last_6mths
Real number (ℝ≥0)

ZEROS
Distinct count20
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0664
Minimum0
Maximum25
Zeros4607
Zeros (%)46.1%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum25
Range25
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.475875625
Coefficient of variation (CV)1.383979393
Kurtosis23.68251185
Mean1.0664
Median Absolute Deviation (MAD)1.01822448
Skewness3.116512638
Sum10664
Variance2.178208861
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 5.5 6.5 8.5 9.5 25. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4607 46.1%
 
1 2684 26.8%
 
2 1431 14.3%
 
3 731 7.3%
 
4 227 2.3%
 
5 152 1.5%
 
6 76 0.8%
 
7 42 0.4%
 
8 27 0.3%
 
9 10 0.1%
 
Other values (10) 13 0.1%
 
ValueCountFrequency (%) 
0 4607 46.1%
 
1 2684 26.8%
 
2 1431 14.3%
 
3 731 7.3%
 
4 227 2.3%
 
ValueCountFrequency (%) 
25 1 < 0.1%
 
24 1 < 0.1%
 
18 2 < 0.1%
 
17 1 < 0.1%
 
16 1 < 0.1%
 

mths_since_last_delinq
Real number (ℝ≥0)

ZEROS
Distinct count91
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.222
Minimum0
Maximum120
Zeros6479
Zeros (%)64.8%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q323
95-th percentile65
Maximum120
Range120
Interquartile range (IQR)23

Descriptive statistics

Standard deviation21.99844788
Coefficient of variation (CV)1.663776122
Kurtosis1.249060661
Mean13.222
Median Absolute Deviation (MAD)17.6746228
Skewness1.556097754
Sum132220
Variance483.9317092
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 11.5 48.5 76.5 82.5 120. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6479 64.8%
 
30 69 0.7%
 
34 66 0.7%
 
23 65 0.7%
 
38 65 0.7%
 
24 64 0.6%
 
44 64 0.6%
 
20 63 0.6%
 
33 63 0.6%
 
18 61 0.6%
 
Other values (81) 2941 29.4%
 
ValueCountFrequency (%) 
0 6479 64.8%
 
1 6 0.1%
 
2 29 0.3%
 
3 40 0.4%
 
4 37 0.4%
 
ValueCountFrequency (%) 
120 1 < 0.1%
 
115 1 < 0.1%
 
97 1 < 0.1%
 
96 1 < 0.1%
 
95 1 < 0.1%
 

mths_since_last_record
Real number (ℝ≥0)

ZEROS
Distinct count94
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean114.1828
Minimum0
Maximum119
Zeros267
Zeros (%)2.7%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile91
Q1119
median119
Q3119
95-th percentile119
Maximum119
Range119
Interquartile range (IQR)0

Descriptive statistics

Standard deviation20.78681778
Coefficient of variation (CV)0.1820485903
Kurtosis22.76765479
Mean114.1828
Median Absolute Deviation (MAD)8.85020928
Skewness-4.838179104
Sum1141828
Variance432.0917933
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 3. 37. 83.5 118.5 119. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
119 9163 91.6%
 
0 267 2.7%
 
89 21 0.2%
 
116 18 0.2%
 
87 17 0.2%
 
92 17 0.2%
 
86 17 0.2%
 
104 16 0.2%
 
100 16 0.2%
 
114 16 0.2%
 
Other values (84) 432 4.3%
 
ValueCountFrequency (%) 
0 267 2.7%
 
6 1 < 0.1%
 
11 1 < 0.1%
 
17 1 < 0.1%
 
20 2 < 0.1%
 
ValueCountFrequency (%) 
119 9163 91.6%
 
118 11 0.1%
 
117 10 0.1%
 
116 18 0.2%
 
115 10 0.1%
 

open_acc
Real number (ℝ≥0)

Distinct count36
Unique (%)0.4%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean9.334567284
Minimum1
Maximum39
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1
5-th percentile3
Q16
median9
Q312
95-th percentile18
Maximum39
Range38
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.526589744
Coefficient of variation (CV)0.4849276465
Kurtosis1.838467994
Mean9.334567284
Median Absolute Deviation (MAD)3.516796938
Skewness1.063599744
Sum93299
Variance20.49001471
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7 1035 10.3%
 
6 990 9.9%
 
8 937 9.4%
 
9 929 9.3%
 
10 805 8.1%
 
5 763 7.6%
 
11 692 6.9%
 
4 631 6.3%
 
12 577 5.8%
 
13 487 4.9%
 
Other values (26) 2149 21.5%
 
ValueCountFrequency (%) 
1 7 0.1%
 
2 163 1.6%
 
3 374 3.7%
 
4 631 6.3%
 
5 763 7.6%
 
ValueCountFrequency (%) 
39 1 < 0.1%
 
36 2 < 0.1%
 
35 1 < 0.1%
 
33 3 < 0.1%
 
32 1 < 0.1%
 

pub_rec
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
9427
1
 
550
2
 
18
3
 
5
ValueCountFrequency (%) 
0 9427 94.3%
 
1 550 5.5%
 
2 18 0.2%
 
3 5 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

revol_bal
Real number (ℝ≥0)

ZEROS
Distinct count8130
Unique (%)81.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14271.0074
Minimum0
Maximum1207359
Zeros278
Zeros (%)2.8%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile277.95
Q13524.5
median8645.5
Q316952.25
95-th percentile44554.85
Maximum1207359
Range1207359
Interquartile range (IQR)13427.75

Descriptive statistics

Standard deviation25437.9082
Coefficient of variation (CV)1.782488614
Kurtosis570.4140985
Mean14271.0074
Median Absolute Deviation (MAD)11728.71446
Skewness16.32424653
Sum142710074
Variance647087173.7
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 2.550000e+01 7.825000e+02 5.878000e+03 ... 8.222900e+04 1.205955e+05 1.727790e+05 2.836010e+05 1.207359e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 278 2.8%
 
2227 6 0.1%
 
1763 6 0.1%
 
11628 5 0.1%
 
4801 5 0.1%
 
760 5 0.1%
 
5272 4 < 0.1%
 
18550 4 < 0.1%
 
15 4 < 0.1%
 
5220 4 < 0.1%
 
Other values (8120) 9679 96.8%
 
ValueCountFrequency (%) 
0 278 2.8%
 
1 2 < 0.1%
 
3 2 < 0.1%
 
5 1 < 0.1%
 
6 2 < 0.1%
 
ValueCountFrequency (%) 
1207359 1 < 0.1%
 
602519 1 < 0.1%
 
508961 1 < 0.1%
 
487589 1 < 0.1%
 
423189 1 < 0.1%
 

revol_util
Real number (ℝ≥0)

ZEROS
Distinct count1027
Unique (%)10.3%
Missing26
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean48.450771
Minimum0
Maximum100.6
Zeros254
Zeros (%)2.5%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile2.8
Q125
median48.7
Q371.8
95-th percentile93.6
Maximum100.6
Range100.6
Interquartile range (IQR)46.8

Descriptive statistics

Standard deviation28.22055724
Coefficient of variation (CV)0.5824583727
Kurtosis-1.099296594
Mean48.450771
Median Absolute Deviation (MAD)24.12794997
Skewness-0.01672374423
Sum483247.99
Variance796.3998507
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 254 2.5%
 
46.6 21 0.2%
 
43.4 20 0.2%
 
0.1 20 0.2%
 
47.6 19 0.2%
 
56.8 19 0.2%
 
55.4 19 0.2%
 
53.6 19 0.2%
 
70 19 0.2%
 
31.4 18 0.2%
 
Other values (1017) 9546 95.5%
 
(Missing) 26 0.3%
 
ValueCountFrequency (%) 
0 254 2.5%
 
0.03 1 < 0.1%
 
0.1 20 0.2%
 
0.12 1 < 0.1%
 
0.2 11 0.1%
 
ValueCountFrequency (%) 
100.6 1 < 0.1%
 
100 1 < 0.1%
 
99.9 4 < 0.1%
 
99.8 5 0.1%
 
99.7 3 < 0.1%
 

total_acc
Real number (ℝ≥0)

Distinct count75
Unique (%)0.8%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean22.01130565
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1
5-th percentile6
Q113
median20
Q329
95-th percentile44
Maximum90
Range89
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.70939957
Coefficient of variation (CV)0.5319720581
Kurtosis0.9238037612
Mean22.01130565
Median Absolute Deviation (MAD)9.292220477
Skewness0.8707976619
Sum220003
Variance137.1100383
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
15 369 3.7%
 
20 360 3.6%
 
17 360 3.6%
 
12 357 3.6%
 
14 351 3.5%
 
19 346 3.5%
 
16 340 3.4%
 
18 339 3.4%
 
13 331 3.3%
 
22 329 3.3%
 
Other values (65) 6513 65.1%
 
ValueCountFrequency (%) 
1 3 < 0.1%
 
2 10 0.1%
 
3 58 0.6%
 
4 115 1.1%
 
5 144 1.4%
 
ValueCountFrequency (%) 
90 1 < 0.1%
 
81 1 < 0.1%
 
80 1 < 0.1%
 
79 1 < 0.1%
 
78 1 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
f
9983
m
 
17
ValueCountFrequency (%) 
f 9983 99.8%
 
m 17 0.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 2 100.0%
 
ValueCountFrequency (%) 
Latin 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
2
3424
3
3299
1
3277
ValueCountFrequency (%) 
2 3424 34.2%
 
3 3299 33.0%
 
1 3277 32.8%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

policy_code
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
pc3
2098
pc5
2025
pc1
1978
pc2
1962
pc4
1937
ValueCountFrequency (%) 
pc3 2098 21.0%
 
pc5 2025 20.2%
 
pc1 1978 19.8%
 
pc2 1962 19.6%
 
pc4 1937 19.4%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Decimal_Number 5 71.4%
 
Lowercase_Letter 2 28.6%
 
ValueCountFrequency (%) 
Common 5 71.4%
 
Latin 2 28.6%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

cr_line_yrs
Real number (ℝ≥0)

Distinct count50
Unique (%)0.5%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean1997.014807
Minimum1970
Maximum2069
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1970
5-th percentile1984
Q11994
median1998
Q32001
95-th percentile2006
Maximum2069
Range99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation7.741003471
Coefficient of variation (CV)0.003876287468
Kurtosis21.63770675
Mean1997.014807
Median Absolute Deviation (MAD)5.242684033
Skewness1.78954221
Sum19960163
Variance59.92313473
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2000 839 8.4%
 
1998 748 7.5%
 
1999 715 7.1%
 
2001 642 6.4%
 
1997 601 6.0%
 
1996 592 5.9%
 
1995 518 5.2%
 
1994 513 5.1%
 
2002 503 5.0%
 
2003 455 4.5%
 
Other values (40) 3869 38.7%
 
ValueCountFrequency (%) 
1970 14 0.1%
 
1971 11 0.1%
 
1972 13 0.1%
 
1973 18 0.2%
 
1974 14 0.1%
 
ValueCountFrequency (%) 
2069 9 0.1%
 
2068 7 0.1%
 
2067 6 0.1%
 
2066 2 < 0.1%
 
2065 2 < 0.1%
 

cr_line_mths
Real number (ℝ≥0)

Distinct count12
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.8571
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.546200381
Coefficient of variation (CV)0.5171574545
Kurtosis-1.244569145
Mean6.8571
Median Absolute Deviation (MAD)3.09104728
Skewness-0.1786189228
Sum68571
Variance12.57553714
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 4.5 8.5 11.5 12. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 1057 10.6%
 
11 999 10.0%
 
12 972 9.7%
 
9 923 9.2%
 
1 904 9.0%
 
8 794 7.9%
 
7 771 7.7%
 
6 740 7.4%
 
5 740 7.4%
 
2 728 7.3%
 
Other values (2) 1372 13.7%
 
ValueCountFrequency (%) 
1 904 9.0%
 
2 728 7.3%
 
3 696 7.0%
 
4 676 6.8%
 
5 740 7.4%
 
ValueCountFrequency (%) 
12 972 9.7%
 
11 999 10.0%
 
10 1057 10.6%
 
9 923 9.2%
 
8 794 7.9%
 

delinq_2yrs_bin
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
8915
1
 
1085
ValueCountFrequency (%) 
0 8915 89.1%
 
1 1085 10.8%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
5393
0
4607
ValueCountFrequency (%) 
1 5393 53.9%
 
0 4607 46.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

is_bademp_lengthhome_ownershipannual_incverification_statuspymnt_planpurpose_catzip_codeaddr_statedebt_to_incomedelinq_2yrsinq_last_6mthsmths_since_last_delinqmths_since_last_recordopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statusmths_since_last_major_derogpolicy_codecr_line_yrscr_line_mthsdelinq_2yrs_bininq_last_6mths_bin
0010mortgage50000.0not verifiednmedical766xxtx10.870.00.00.011915.001208712.144.0f1pc41992.01200
101rent39216.0not verifiedndebt consolidation660xxks9.150.02.00.01194.001011464.05.0f2pc12005.01101
204rent65000.0not verifiedncredit card916xxca11.240.00.00.01194.00810.68.0f3pc41970.0600
3010mortgage57500.0not verifiedndebt consolidation124xxny6.181.00.016.01196.001003037.123.0f2pc21982.0910
4010mortgage50004.0verified - incomendebt consolidation439xxoh19.030.04.00.01198.001074040.421.0f3pc31999.01001
504rent47028.0verified - incomenother200xxdc7.832.01.019.01196.00171526.425.0f3pc31999.01211
6010mortgage126000.0not verifiedncredit card103xxny14.280.00.00.011918.00546611.129.0f3pc11979.01100
706mortgage42000.0verified - income sourcendebt consolidation891xxnv10.290.00.00.01199.001035495.910.0f3pc32006.0400
802mortgage50000.0verified - incomendebt consolidation612xxil15.360.02.00.011911.001966259.227.0f1pc52001.0201
901rent40000.0not verifiedncar926xxca6.480.01.00.011911.001999818.323.0f1pc51995.0501

Last rows

is_bademp_lengthhome_ownershipannual_incverification_statuspymnt_planpurpose_catzip_codeaddr_statedebt_to_incomedelinq_2yrsinq_last_6mthsmths_since_last_delinqmths_since_last_recordopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statusmths_since_last_major_derogpolicy_codecr_line_yrscr_line_mthsdelinq_2yrs_bininq_last_6mths_bin
9990010mortgage120000.0verified - incomenhome improvement481xxmi14.441.00.04.011914.001471659.831.0f2pc21994.0210
9991010rent63000.0verified - income sourcenmedical018xxma10.080.00.00.01196.00601.122.0f3pc11989.0500
9992010rent52000.0verified - incomendebt consolidation124xxny23.700.00.070.01198.001500291.518.0f2pc51998.0800
9993010own95892.0verified - incomenhome improvement110xxny8.700.02.00.01193.00213930.67.0f3pc51995.0701
999411rent24996.0verified - income sourcendebt consolidation913xxca3.790.00.00.01192.00480156.57.0f1pc12005.0800
999505mortgage66250.0verified - incomenwedding014xxma9.400.01.00.01198.00365624.110.0f2pc32001.0901
999601rent26000.0verified - income sourcendebt consolidation112xxny20.490.01.079.01198.00670958.912.0f2pc32000.0501
999708rent47831.0not verifiedndebt consolidation070xxnj24.130.00.00.01119.011134660.717.0f3pc31989.01200
999806mortgage70000.0not verifiednmajor purchase244xxva16.182.02.016.01199.001715750.927.0f2pc31999.0311
999901rent70560.0not verifiedncredit card900xxca16.130.01.053.011915.00230422.634.0f2pc52000.0901